Aiming at the inaccurate positioning problem of salient object in the previous weakly supervised salient object detection algorithms, a weakly supervised salient object detection algorithm based on bounding box annotation was proposed. In the proposed algorithm, the minimum bounding rectangle boxes, which are the bounding boxes of all objects in the image were adopted as supervision information. Firstly, the initial saliency map was generated based on the bounding box annotation and GrabCut algorithm. Then, a correction module for missing object was designed to obtain the optimized saliency map. Finally, by combining the advantages of the traditional methods and deep learning methods, the optimized saliency map was used as the pseudo ground-truth to learn a salient object detection model through neural network. Comparison of the proposed algorithm and six unsupervised and four weakly supervised saliency detection algorithms was carried on four public datasets. Experimental results show that the proposed algorithm significantly outperforms comparison algorithms in both Max F-measure value (Max-F) and Mean Absolute Error (MAE) on four datasets. Compared with SBB (Sales Bounding Boxes), which is also a weakly supervised method based on boundary box annotation, the annotation method of the proposed algorithm is simpler. Experiments were conducted on four datasets, ECSSD, DUTS-TE, HKU-IS, DUT-OMRON, and the Max-F increased by 1.82%, 4.00%, 1.27% and 5.33% respectively, and the MAE decreased by 13.89%, 15.07%, 8.77% and 13.33%, respectively. It can be seen that the proposed algorithm is a weakly supervised salient object detection algorithm with good detection performance.
In the intelligent parking space management system, a decrease in accuracy and effectiveness of parking space prediction can be caused by factors such as illumination changes and parking space occlusion. To overcome this problem, a parking space detection method based on self-supervised learning HOG (Histogram of Oriented Gradient) prediction auxiliary task was proposed. Firstly, a self-supervised learning auxiliary task to predict the HOG feature in occluded part of image was designed, the visual representation of the image was learned more fully and the feature extraction ability of the model was improved by using the MobileViTBlock (light-weight, general-purpose, and Mobile-friendly Vision Transformer Block) to synthesize the global information of the image. Then, an improvement was made to the SE (Squeeze-and-Excitation) attention mechanism, thereby enabling the model to achieve or even exceed the effect of the original SE attention mechanism at a lower computational cost. Finally, the feature extraction part trained by the auxiliary task was applied to the downstream classification task for parking space status prediction. Experiments were carried out on the mixed dataset of PKLot and CNRPark. The experimental results show that the proposed model has the accuracy reached 97.49% on the test set; compared to RepVGG, the accuracy of occlusion prediction improves by 5.46 percentage points, which represents a great improvement compared with other parking space detection algorithms.
The traditional clustering methods are carried out in the data space, and clustered data is high-dimensional. In order to solve these two problems, a new binary image clustering method, Clustering based on Discrete Hashing (CDH), was proposed. To reduce the dimension of data, L 21 ?norm was used in this framework to realize adaptive feature selection. At the same time, the data was mapped into binary Hamming space by the hashing method. Then, the sparse binary matrix was decomposed into a low-rank matrix in the Hamming space to complete fast image clustering. Finally, an optimization scheme that could converge quickly was used to solve the objective function. Experimental results on image datasets (Caltech101, Yale, COIL20, ORL) show that this method can effectively improve the efficiency of clustering. Compared with the traditional clustering methods,such as K-means and Spectral Clustering (SC),the time efficiency of CDH was improved by 87 and 98 percentage points respectively in the Gabor view of the Caltech101 dataset when processing high-dimensional data.
Single Long Short-Term Memory (LSTM) network cannot effectively extract key information and cannot accurately fit data distribution in trajectory prediction. In order to solve the problems, a short-term trajectory prediction model of aircraft based on attention mechanism and Generative Adversarial Network (GAN) was proposed. Firstly, different weights were assigned to the trajectory by introducing attention mechanism, so that the influence of important features in the trajectory was able to be improved. Secondly, the trajectory sequence features were extracted by using LSTM, and the convergence net was used to gather all aircraft features within the time step. Finally, the characteristic of GAN optimizing continuously in adversarial game was used to optimize the model in order to improve the model accuracy. Compared with Social Generative Adversarial Network (SGAN), the proposed model has the Average Displacement Error (ADE), Final Displacement Error (FDE) and Maximum Displacement Error (MDE) reduced by 20.0%, 20.4% and 18.3% respectively on the dataset during climb phase. Experimental results show that the proposed model can predict future trajectories more accurately.
The development of pre-trained language models has greatly promoted the progress of machine reading comprehension tasks. In order to make full use of shallow features of the pre-trained language model and further improve the accuracy of predictive answer of question answering model, a three-stage question answering model based on Bidirectional Encoder Representation from Transformers (BERT) was proposed. Firstly, the three stages of pre-answering, re-answering and answer-adjusting were designed based on BERT. Secondly, the inputs of embedding layer of BERT were treated as shallow features to pre-generate an answer in pre-answering stage. Then, the deep features fully encoded by BERT were used to re-generate another answer in re-answering stage. Finally, the final prediction result was generated by combining the previous two answers in answer-adjusting stage. Experimental results on English dataset Stanford Question Answering Dataset 2.0 (SQuAD2.0) and Chinese dataset Chinese Machine Reading Comprehension 2018 (CMRC2018) of span-extraction question answering task show that the Exact Match (EM) and F1 score (F1) of the proposed model are improved by the average of 1 to 3 percentage points compared with those of the similar baseline models, and the model has the extracted answer fragments more accurate. By combining shallow features of BERT with deep features, this three-stage model extends the abstract representation ability of BERT, and explores the application of shallow features of BERT in question answering models, and has the characteristics of simple structure, accurate prediction, and fast speed of training and inference.